Image-specific Artificial Neural Network – image search/recognition, face recognition, face verification, self-driving and more
Its concept comes from brain visual cortex where many neurons have a small region of the visual field (in CNN, filter)
The name of CNN comes from “convolution”, one of the most important operation in CNN.
The first Convolutional Neural Network is LeNet-5, which can classify digits from hand-written number.
Advantage compared to DNN
When the computer see the image, it sees pixel values.
Convolution – the mathematical combination of two functions to produce a third function.
The flattening section of CNN architecture will flatten a pooled layer to one column so that the data will be passed through ANN to get processed further.
The neuron of flatten layer detects a certain feature like nose, mouth,,,
In this CNN architecture, pooling layer of (4,4,20) will be flattened to flatten layer of (4x4x20=320), which means that it has 320 image features.
The fully-connected layer section of CNN has three layers.
The output layer section outputs an N (e.g., 10) dimensional vector where N is the number of classes that the program has to choose from.
For example,
## Keras for DNN
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam
import numpy as np
Below is CNN acchitecture that the data scientist builds for the workshop.
## Buiid CNN model
model_cnn = Sequential()
## First Convolution and Pooling
model_cnn.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)))
model_cnn.add(MaxPooling2D(pool_size=(2, 2)))
## Second Convolution and Pooling
model_cnn.add(Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'))
model_cnn.add(MaxPooling2D(pool_size=(2, 2)))
## Third Convolution and Pooling
model_cnn.add(Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same'))
model_cnn.add(MaxPooling2D(pool_size=(2, 2)))
## Flatten Layer
model_cnn.add(Flatten())
## Fully-Connected Layer
model_cnn.add(Dense(128, activation='relu'))
model_cnn.add(Dropout(0.5))
## Output Layer
model_cnn.add(Dense(2, activation='softmax'))
## Compile CNN model
model_cnn.compile(loss=['categorical_crossentropy'], optimizer='adam', metrics=['accuracy'])
model_cnn.summary()
Model: "sequential_4" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_3 (Conv2D) (None, 26, 26, 64) 640 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 13, 13, 64) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 13, 13, 128) 73856 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 6, 6, 128) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 6, 6, 256) 295168 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 3, 3, 256) 0 _________________________________________________________________ flatten_3 (Flatten) (None, 2304) 0 _________________________________________________________________ dense_12 (Dense) (None, 128) 295040 _________________________________________________________________ dropout_1 (Dropout) (None, 128) 0 _________________________________________________________________ dense_13 (Dense) (None, 2) 258 ================================================================= Total params: 664,962 Trainable params: 664,962 Non-trainable params: 0 _________________________________________________________________
Introduction – recurrent neural network model to use sequential information.
Why RNN?
Text notation. # of inputs = # of outputs
Sentimental Analysis (PV signal) : x = text, y = 0/1 or 1 to 5
Music generation \ Picture Description: x= vector, y = text
Machine translation : x = text in English, y = text in French
Introduction of NLP – An area of artificial intelligence on an interaction between computer and human natural language. NLP can program computers to process and analyze natural language data.
## Python Programming for importing word preparation
from keras.preprocessing.text import text_to_word_sequence
texts = ['I do not like her.', 'I love her']
line =[ ]
for text in texts:
words = text_to_word_sequence(text)
line.append(words)
print(line)
labels = [1,0]
[['i', 'do', 'not', 'like', 'her'], ['i', 'love', 'her']]
## Python Programming for importing word preparation
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(line)
tokenized_words = tokenizer.texts_to_sequences(line)
print(tokenized_words)
[[1, 3, 4, 5, 2], [1, 6, 2]]
from keras.preprocessing.sequence import pad_sequences
padded_lines = pad_sequences(tokenized_words , maxlen=5, padding='post')
print(padded_lines )
[[1 3 4 5 2] [1 6 2 0 0]]
## import RNN model
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense, LSTM, Embedding
tokenizer = Tokenizer() tokenizer.fit_on_texts(X_train_rnn) X_input = tokenizer.texts_to_sequence(X_train_rnn)
model_rnn = Sequential()
model_rnn.add(Embedding(input_dim=40, output_dim=100, input_length=5))
model_rnn.add(LSTM(10))
model_rnn.add(Dense(10, activation='relu'))
model_rnn.add(Dense(1, activation='softmax'))
model_rnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_rnn.summary()
Model: "sequential_5" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_2 (Embedding) (None, 5, 100) 4000 _________________________________________________________________ lstm_2 (LSTM) (None, 10) 4440 _________________________________________________________________ dense_14 (Dense) (None, 10) 110 _________________________________________________________________ dense_15 (Dense) (None, 1) 11 ================================================================= Total params: 8,561 Trainable params: 8,561 Non-trainable params: 0 _________________________________________________________________
## Final Data Preparation
X_train_rnn = padded_lines
y_train_rnn = np.array(labels) ## update
print("X training shape : ", X_train_rnn.shape)
print("y training shape : ", y_train_rnn.shape)
X training shape : (2, 5) y training shape : (2,)
## Train RNN model with
model_rnn.fit(X_train_rnn, y_train_rnn, epochs=10)
Epoch 1/10 1/1 [==============================] - 1s 1s/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 2/10 1/1 [==============================] - 0s 3ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 3/10 1/1 [==============================] - 0s 4ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 4/10 1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 5/10 1/1 [==============================] - 0s 3ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 6/10 1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 7/10 1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 8/10 1/1 [==============================] - 0s 3ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 9/10 1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00 - accuracy: 0.5000 Epoch 10/10 1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00 - accuracy: 0.5000
<keras.callbacks.History at 0x2888f23bd60>
Why experience programmers can learn other or new language faster?
Machine Learning method where a pre-trained model is reused as a starting point of the model development for another/similar task.
For Image Data
For Language Data
Keras Transfer Learning
## import pre-trained model
from keras.applications import vgg16
from keras.layers.core import Dense, Flatten
from keras.layers import Input
from keras.models import Model, Sequential
## import VGG16 model for transfer learning
input_shape= (224, 224, 3)
vgg_model = vgg16.VGG16(include_top=False, weights='imagenet', input_shape=input_shape)
vgg16_model = Sequential()
vgg16_model.add(Conv2D(64, (3, 3), input_shape=input_shape, padding='same', activation='relu'))
vgg16_model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
vgg16_model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
vgg16_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vgg16_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vgg16_model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
vgg16_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vgg16_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vgg16_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vgg16_model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
vgg16_model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
vgg16_model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
vgg16_model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
vgg16_model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
vgg16_model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
vgg16_model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
vgg16_model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
vgg16_model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
vgg16_model.add(Flatten())
vgg16_model.add(Dense(4096, activation='relu'))
vgg16_model.add(Dense(4096, activation='relu'))
vgg16_model.add(Dense(1000, activation='softmax'))
## Summary of pre-trained VGG model
vgg_model.summary()
Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 224, 224, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 ================================================================= Total params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0 _________________________________________________________________
#Create your own input format (here 3x200x200)
vgg_input = Input(shape=(224, 224, 3),name = 'image_input')
## add custom layers to pre-trained VGG model
X = vgg_model.output
X1 = Flatten()(X)
X2 = Dense(1024, activation='relu')(X1)
X3 = Dense(1024, activation='relu')(X2)
Target = Dense(15, activation='softmax')(X3)
model_vgg_tf = Model(vgg_model.input, Target)
model_vgg_tf.compile(loss="categorical_crossentropy", optimizer='adam', metrics=['accuracy'])
model_vgg_tf.summary()
Model: "model_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 224, 224, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten_4 (Flatten) (None, 25088) 0 _________________________________________________________________ dense_16 (Dense) (None, 1024) 25691136 _________________________________________________________________ dense_17 (Dense) (None, 1024) 1049600 _________________________________________________________________ dense_18 (Dense) (None, 15) 15375 ================================================================= Total params: 41,470,799 Trainable params: 41,470,799 Non-trainable params: 0 _________________________________________________________________
## add custom layers to pre-trained VGG model
## Build the same model with the different method
model_vgg_tf2 = Sequential()
model_vgg_tf2.add(vgg_model)
model_vgg_tf2.add(Flatten())
model_vgg_tf2.add(Dense(1024, activation='relu'))
model_vgg_tf2.add(Dense(1024, activation='relu'))
model_vgg_tf2.add(Dense(15, activation='softmax'))
model_vgg_tf2.compile(loss=['categorical_crossentropy'], optimizer='adam', metrics=['accuracy'])
model_vgg_tf2.summary()
Model: "sequential_6" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Functional) (None, 7, 7, 512) 14714688 _________________________________________________________________ flatten_5 (Flatten) (None, 25088) 0 _________________________________________________________________ dense_19 (Dense) (None, 1024) 25691136 _________________________________________________________________ dense_20 (Dense) (None, 1024) 1049600 _________________________________________________________________ dense_21 (Dense) (None, 15) 15375 ================================================================= Total params: 41,470,799 Trainable params: 41,470,799 Non-trainable params: 0 _________________________________________________________________
BERT uses two training strategies
Predict the word in a situation that it is missing.
"The first death in the US [blank] reported on February 29"
What could be included in here?
## import BERT model
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
# Load pre-trained model tokenizer (vocabulary)
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
text = "The first death in the US was reported on February 29."
bert_tokenized_text = bert_tokenizer.tokenize(text)
print(bert_tokenized_text)
['the', 'first', 'death', 'in', 'the', 'us', 'was', 'reported', 'on', 'february', '29', '.']
# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 6
bert_tokenized_text[masked_index] = '[MASK]'
#assert bert_tokenized_text == ['who', 'was', 'jim', 'henson', '?', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer']
## printing
print(bert_tokenized_text)
['the', 'first', 'death', 'in', 'the', 'us', '[MASK]', 'reported', 'on', 'february', '29', '.']
# Convert token to vocabulary indices
bert_indexed_tokens = bert_tokenizer.convert_tokens_to_ids(bert_tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
#segments_ids = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
bert_tokens_tensor = torch.tensor([bert_indexed_tokens])
#segments_tensors = torch.tensor([segments_ids])
print(bert_tokens_tensor)
tensor([[1996, 2034, 2331, 1999, 1996, 2149, 103, 2988, 2006, 2337, 2756, 1012]])
# Load pre-trained model (weights)
model_bertBML = BertForMaskedLM.from_pretrained('bert-base-uncased')
# Predict all tokens
predictions = model_bertBML(bert_tokens_tensor)
# confirm we were able to predict 'henson'
predicted_index = torch.argmax(predictions[0, masked_index]).item()
predicted_token = bert_tokenizer.convert_ids_to_tokens([predicted_index])
print(predicted_token)
['.']
Predict where the second sentence is the next sentence.
import torch
from pytorch_pretrained_bert import BertTokenizer, BertForNextSentencePrediction
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
sentA = "[CLS]" + " What Is and Isn’t Affected by the Government Shutdown " + "[SEP]"
sentB = " Transportation Security Administration officers checking passengers at Pittsburgh International Airport last week. The agency’s employees have called out sick in increased numbers across the country since the shutdown began. " +"[SEP]"
# sentA = "[CLS]" + " Meanwhile: For a Knife, Dagger, Sword, Machete or Zombie-Killer, Just Ask These Ladies "+ "[SEP]"
# sentB = " Whitehead’s Cutlery in Butte, Mont., is 128 years old and will gladly sharpen scissors sold generations ago. "+"[SEP]"
text = sentA + sentB
tokenized_sentA = tokenizer.tokenize(sentA)
tokenized_sentB = tokenizer.tokenize(sentB)
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_sentA + tokenized_sentB)
segments_ids = [0] * len(tokenized_sentA) + [1] * len(tokenized_sentB)
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
# Load pre-trained model (weights)
model = BertForNextSentencePrediction.from_pretrained('bert-large-uncased')
# Predict is Next Sentence ?
preds = model(tokens_tensor, segments_tensors)
Next_sentence = torch.nn.functional.softmax(preds, dim=1)[:, 0]
Next_sentence # if it is 1, the second sentence is the next sentence
tensor([1.0000], grad_fn=<SelectBackward>)
Your comments and questions are valued and encouraged. Please contact at:
Kevin Lee
AVP of Machine Learning and AI Consultant
Kevin.lee@genpact.com
Tweet: @HelloKevinLee
LinkedIn: www.linkedin.com/in/HelloKevinLee/